multi-agent generative adversarial imitation learning
Review for NeurIPS paper: Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning
Additional Feedback: I like authors tried their experiments in various perspectives, but experience sharing is occasionally seen from the existing literature. For example, although it wasn't mentioned in the paper, [1] used experience sharing among agents for their implementation, and I believe there may be other works with the topic of "MARL for homogeneous agents". The main reason I score "below acceptance" is that quite weak baselines seem to be used: - In Table 1, QMIX and MADDPG highly underperforms SEAC and other baselines (IAC, SNAC). However, since methods with CTDE are mostly more stable than independent learning methods, I think this part should be explained in more detail. Although other reviewers have argued the strength of this work from the importance weighting and simplicity of methods, I still think there should have been stronger baselines.
Reviews: Multi-Agent Generative Adversarial Imitation Learning
This paper proposes several alternative extensions of GAIL to multi-agent imitation learning settings. The paper includes strong, positive results on a wide range of environments against a suitable selection of baselines. However, insufficient details of the environments is provided to reproduce or fully appreciate the complexity of these environments. If accepted, I would request the authors add these to the appendix and would appreciate details (space permitting) to be discussed in the rebuttal - particularly the state representation. The more pressing point I would like to raise for discussion in the rebuttal is with regard to the MACK algorithm proposed for the generator. The authors make a justified argument for the novelty of the algorithm, but do not thoroughly justify why they used this algorithm instead of an established MARL algorithm (e.g.
Multi-Agent Generative Adversarial Imitation Learning
Song, Jiaming, Ren, Hongyu, Sadigh, Dorsa, Ermon, Stefano
Imitation learning algorithms can be used to learn a policy from expert demonstrations without access to a reward signal. However, most existing approaches are not applicable in multi-agent settings due to the existence of multiple (Nash) equilibria and non-stationary environments. We propose a new framework for multi-agent imitation learning for general Markov games, where we build upon a generalized notion of inverse reinforcement learning. We further introduce a practical multi-agent actor-critic algorithm with good empirical performance. Our method can be used to imitate complex behaviors in high-dimensional environments with multiple cooperative or competing agents. Papers published at the Neural Information Processing Systems Conference.